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Abstract 

Distributed  consensus  in  the  Wasserstein  metric  space  of  probability  measures  is  intro¬ 
duced  in  this  work.  Convergence  of  each  agent’s  measure  to  a  common  measure  value  is 
proven  under  a  weak  network  connectivity  condition.  The  common  measure  reached  at 
each  agent  is  one  minimizing  a  weighted  sum  of  its  Wasserstein  distance  to  all  initial  agent 
measures.  This  measure  is  known  as  the  Wasserstein  bary centre.  Special  cases  involving 
Gaussian  measures,  empirical  measures,  and  time-invariant  network  topologies  are  consid¬ 
ered,  where  convergence  rates  and  average-consensus  results  are  given.  This  algorithm  has 
potential  applicability  in  computer  vision,  machine  learning  and  distributed  estimation,  etc. 


1  Introduction 

The  problem  of  distributed  consensus  concerns  a  group  of  agents  that  seek  to  reach  agreement 
upon  certain  state  variables  of  interest  by  exchanging  information  across  a  network.  Typically 
the  agents  are  connected  via  a  network  that  changes  over  time  due  to  link  failures,  node  failures, 
packet  drops  etc.  For  example,  in  distributed  sensor  networks  the  interaction  topology  may 
change  over  time  as  individual  nodes  (or  some  subset  of  such)  may  be  mobile  or  unreliable 
or  communication  constraints  are  also  present.  All  such  variations  in  topology  can  happen 
randomly  and  often  the  network  is  disconnected  for  some  time.  Studies  on  the  convergence  of 
consensus  algorithms  (to  a  common  agreed  ‘value’  at  each  agent)  are  often  motivated  by  such 
complex  time- varying  networks. 

1.1  Background 

The  consensus  problem  has  a  long  history,  e.g  [1],  which  is  too  broad  to  cover  here.  We 
highlight  m  for  further  history  and  background. 

Many  consensus  algorithms  have  been  proposed  in  the  literature.  References  [2|!4[f6[f8]  focus 
on  linear  update  rules  (at  each  agent)  and  typically  concern  average-consensus  or  consensus 
about  some  linear  function  of  all  initial  agent  states  in  Euclidean  space.  The  average-consensus 
problem  has  a  natural  relationship  with  distributed  linear  least  squares  or  distributed  (linear) 
maximum  likelihood  estimation  [9j  and  distributed  Kalman  filtering  PMOI].  Alternative  con¬ 
sensus  algorithms  using  nonlinear  update  rules  have  been  proposed  and  studied  in  [SUMS]. 
Here  consensus  to  general  functions  (e.g.  the  maximum  or  minimum  etc.)  of  all  initial  agent 
states  may  be  sought  as  in  mm  and  even  finite-time  convergence  may  be  achievable  [181119] . 
One  may  also  want  to  achieve  consensus  to  some  time- varying  reference  signal  as  in  [20L121] . 

‘This  work  was  supported  by  AFOSR/AOARD  via  AO ARD- 144042. 

t  A.N.  Bishop  is  with  the  University  of  Technology  Sydney  (UTS)  and  NICTA.  He  is  also  an  Adjunct  Fellow 
at  the  Australian  National  University  (ANU).  He  is  supported  by  NICTA  and  the  Australian  Research  Council 
(ARC)  via  a  Discovery  Early  Career  Researcher  Award  (DE-120102873). 

*A.  Doucet  is  with  the  University  of  Oxford,  Department  of  Statistics.  He  is  partially  supported  by  an 
Engineering  and  Physical  Sciences  Research  Council  Established  Career  Fellowship. 


We  note  here  that  the  majority  of  the  literature  on  consensus  concerns  agreement  in  Eu¬ 
clidean  space  as  exemplified  by  the  seminal  papers  of  mm-  However,  there  are  exceptions.  The 
problem  of  synchronisation  is  closely  related  to  consensus  but  typically  deals  with  the  problem 
of  driving  a  network  of  oscillators  to  a  common  frequency/phase.  This  work  typically  concerns 
nonlinear  manifolds  such  as  the  circle.  A  survey  on  synchronisation  is  given  in  [22]  while  con¬ 
sensus  and  synchronisation  are  related  in  [23].  Some  other  notable  exceptions  of  consensus  in 
non-Euclidean  spaces  are  [241-129] .  In  particular,  [251127]  consider  general  nonlinear  consensus 
on  manifolds  by  embedding  such  manifolds  in  a  suitable  high-dimensional  Euclidean  space.  In 
particular,  this  embedding  approach  is  used  to  perform  consensus  on  the  special  orthogonal 
group  and  on  Grassmann  manifolds.  The  authors  in  [23j.26V2S]  study  consensus  in  different 
metric  spaces  which  is  more  closely  related  to  the  present  work.  For  example,  the  author  of  [[28] 
develops  an  analogue  of  Wolfowitz’s  theorem  [3]  for  a  class  of  metric  spaces  with  non-positive 
curvature  which  leads  to  a  notion  of  consensus  in  such  spaces. 

1.2  Contributions 

The  main  contributions  of  this  paper  are  a  novel  algorithm  and  convergence  results  for  dis¬ 
tributed  consensus  in  the  space  of  probability  measures  with  time-varying  interaction  networks. 
We  introduce  a  well-studied  metric  known  as  the  Wasserstein  distance  which  allows  us  to  con¬ 
sider  an  important  set  of  probability  measures  as  a  metric  space  [30j.  The  proposed  consensus 
algorithm  is  based  on  iteratively  updating  each  agent’s  probability  measure  by  finding  a  mea¬ 
sure  that  minimizes  the  weighted  sum  of  its  Wasserstein  distances  to  the  agent’s  own  previous 
measure  plus  all  neighbour  agents’  measures.  We  show  that  convergence  of  the  individual  agents 
measures  to  a  common  probability  measure  is  guaranteed  under  a  weak  network  connectivity 
condition.  The  common  measure  that  is  achieved  asymptotically  at  each  agent  is  the  one  that 
is  closest  simultaneously  to  all  initial  agent  measures  in  the  sense  of  the  Wasserstein  distance. 

This  work  has  potential  applicability  in  the  field  of  distributed  estimation,  distributed  infor¬ 
mation  fusion  and  machine  learning  among  other  fields  discussed  later.  For  example,  suppose 
each  agent  starts  with  a  (posterior)  probability  measure  associated  to  some  common  underlying 
event  of  interest.  Then  one  may  like  to  combine  all  these  measures  (which  amount  to  each 
agents  estimate  and/or  belief  of  the  underlying  event)  into  a  common  probability  measure  that 
captures  all  the  agents  beliefs.  The  proposed  consensus  algorithm  can  achieve  this  in  a  very 
general  distributed  setting.  Related  work  in  [31]  considers  the  application  of  consensus  to  the 
problem  of  distributed  Bayesian  computations.  In  [32]  the  consensus  algorithm  from  {31j  is 
further  studied  and  applied  in  distributed  estimation.  Unlike  m,  the  proposed  method  here 
can  deal  with  singular  measures,  may  be  robust  to  unknown  correlations,  and  the  required 
connectivity  condition  is  weaker  ;33l  3~T  .  We  note  that  a  Monte  Carlo  approximation  of  the 
consensus  algorithm  from  [3 1  was  also  studied  in  [35] .  In  [29]  a  consensus  algorithm  on  the 
Riemannian  manifold  of  (Gaussian)  covariance  matrices  is  introduced  under  the  Fisher  metric 
(related  to  to  the  Kullback-Leibler  distance). 

Further  discussion  on  applications  and  related  topics,  particularly  relevant  in  the  Wasserstein 
domain,  is  given  later. 

This  paper  extends  [36]  with  the  addition  of  results  detailing  convergence  rates  and  network 
properties  in  which  particular  consensus  values  may  be  achieved. 

1.3  Paper  Organization 

The  main  contribution  is  given  in  Section[2]where  a  consensus  algorithm  in  the  space  of  probabil¬ 
ity  measures  is  introduced,  and  its  convergence  is  studied.  In  Section  [3]  we  establish  for  specific 
scenarios  -  initial  Gaussian  measures;  for  empirical  measures;  and  time-invariant  networks  - 
the  exponential  convergence  of  this  consensus  algorithm  and  various  computational  aspects.  In 
Section  [4]  we  discuss  potential  applications.  Concluding  remarks  are  given  in  Section  [5] 


1.4  Notation  and  Conventions 


Consider  a  group  of  agents  indexed  in  V  =  {l,...,n}  and  a  set  of  (possibly)  time- varying 
undirected  links  £{t)  C  V  x  V  defining  a  network  graph  Q(t)(V,£(t)).  The  neighbour  set  at 
agent  i  is  denoted  by  A/](t)  =  {j  G  V  :  ( i,j )  G  £{t)}.  Time  is  indexed  using  N. 

The  graph  adjacency  matrix  A (t)  G  Rnxn  obeys  A  (t)  =  A(f)T  =  [a^  (t)]  where  a,y  (t)  = 

1  <t+  (z,  j)  G  £(i)  and  atj(t)  =  0  otherwise.  Implicit  throughout  is  that  au(t)  =  1  for  all  i  and  t 

and  thus  i  G  7V)(t)  for  all  t.  A  weighted  adjacency  matrix  is  denoted  by  W (t)  =  [wij(t)]  G  Rraxn 
with  =  1  ++  Wij(t )  >  0  and  Wij  =  0  otherwise.  We  require  YljeM’^t)  wij{t)  =  1  and  it 

holds  that  wa(t)  >  0  for  all  i  and  t  so  that  Wij  (t)  G  [0, 1)  whenever  i  ^  j  and  for  all  t. 

The  adjacency  matrix  A  (t)  defines  Q(t)(V,£(t))  and  vice  versa  because  a^  (t)  =  1  <t+  (i,j)  G 
£{t)  and  atj (t)  =  0  (i,j)  £  £(t).  The  weighted  adjacency  matrix  defines  Q(t)(V,£(t))  because 

Wij(t )  >  0  (i,j)  G  £(t)  and  Wij(t )  =  0  (i,j)  ^  £(t),  but  Q(t)(V,£(t))  alone  defines  only 

the  sparsity  pattern  of  W(t). 

Consider  the  sequence  of  graphs  G(tk),  G{tk+ 1), . . . ,  G{tk+T )  on  the  same  vertex  set  V.  The 
union  of  this  sequence  is  denoted  by  <5(4, 4+i)(V, Ute[tfcjtfc+T]£(t)),  i.e.  <5(4,4+ 1)  is  just  a 
graph  on  the  vertex  set  V  with  edges  Lite^kjtk+T]£{t).  The  sequence  is  said  to  be  jointly  connected 
if  ®  is  connected. 

2  Consensus  in  the  Wasserstein  Space  of  Probability  Measures 

The  main  contribution  of  this  work  is  given  in  this  section  where  we  introduce  and  establish  the 
convergence  of  a  consensus  algorithm  in  the  Wasserstein  metric  space  of  probability  measures. 

Suppose  the  state  of  agent  i  is  given  by  a  Radon  probability  measure  /Jn  defined  on  the  Borel 
sets  of  (M,  d)  where  in  this  section  we  restrict  rf:Rxi  +  [0,  oo)  to  be  the  usual  Euclidean 
distance.  Define  the  space  of  all  such  measures  on  (M,  d)  by  if(M)  and  the  subset  of  all  such 
measures  with  finite  pth  moment  by  itp(R)  where  henceforth  we  assume  that  2  <  p  <  oo.  That 
is,  Up  is  the  collection  of  probability  measures  such  that  fR  d(x,  xo)p  dp,i(x)  <  oo  for  a  given, 
arbitrary,  xo  G  R. 

One  can  associate  the  Wasserstein  metric  £p  :  ilp(R)  x  UP(R)  — >  [0,  oo)  with  Up  which  is 
defined  by 

£p{m,Hj)=[  inf  /  d(xi,Xj)p  dj(xi,Xj)  ) 

V7er(/ij,Mi)  J RXR  / 

where  T(/jj,  p3)  denotes  the  collection  of  all  probability  measures  on  R  x  R  with  marginals  pi 
and  pj  on  the  first  and  second  factors;  see  [371I3H]. 

Let  us  recall  some  standard  results  about  the  Wasserstein  metric  space  (Up(R),£p)  when 
p  >  2;  see  e.g.  (371140) . 

1.  (Up(R),£p)  is  a  complete  and  separable  metric  space. 

2.  liuifc^oo  £p(pk,  p)  =  0  is  equivalent  to  weak  convergence  and  convergence  of  the  first  p 
moments. 

3.  Given  two  measures  Pi,Pj  G  Up(R)  then  £p(pi,Pj)  =  £p(pi,  p)  +£p(pj,p)  for  some  p  G 

Up(R). 

4.  More  generally,  there  exists  a  continuously  parameterised  constant  speed  path  ps  G  Up(R), 
s  G  [0,1]  such  that  for  pj ,  pj  G  Up  we  have  ps=o  =  Pi  and  ps=i  =  Pj  and  £p(pi,pj)  = 
£p(pi ,  ps)+£p(pj,Ps),  Vs  G  [0, 1].  The  measure  ps  is  known  as  the  interpolant  measure  [4lJ. 

5.  The  interpolant  measure  defines  a  geodesic  and  consequently  ( ilp,£p )  is  geodesic. 


6.  (iip(R),  £p)  has  vanishing  curvature  in  the  sense  of  Alexandrov  (a  subset  of  CAT(O));  see 
Proposition  4.1  in  39 1 . 

7.  is  simply  connected;  see  [39>4D3- 


All  metrics  are  continuous  and  we  recall  that  a  constant  speed  geodesic  in  (llp(R),£p)  is  a 
curve  ps  :  I  — >  iip  parameterised  on  some  interval  I  C  R  that  satisfies  £p(pSi,  pSj)  =  v\ Si  —  Sj \ 
for  some  constant  v  >  0  and  for  all  Si,Sj  £  I. 

Suppose  the  measure  at  agent  i  is  updated  by 


fii(t  +  1)  =  argmin  E  Wij(t)£p(ri,  fij(t))p 

»?GUP(R) 


(1) 


for  all  i  £  V  where  again  we  require  X^eA/ht)  %'W  =  1  and  wu(t)  >  0  so  that  consequently 
Wij(t )  €  [0, 1)  whenever  i  /  j.  This  operation  is  well-defined  as  discussed  below. 

Application  of  the  update  rule  CD)  to  each  agent  i  G  V  corresponds  to  the  proposed  nonlinear 
(distributed)  consensus  algorithm. 


2.1  Main  Result 

We  state  here  our  main  result. 

Theorem  1.  Consider  a  group  of  agents  V  and  network  G(t)(V,£(t))  where  each  agent  i  has 
initial  state  pi(  0)  £  iip  (R)  and  updates  its  state  Pi(t)  £  ilp(R)  according  to  (QJ).  If  for  all  to  £  N 
the  graph  union  &(to,oo)  is  connected  then  there  exists  p*  £  ilp(R)  such  that 

lim  ip{m  (t)  ,p*)  =  0 


for  any  i  £  V. 

The  proof  of  Theorem  1  is  given  in  the  next  subsection  following  the  provision  of  a  number 
of  supporting  results. 

2.2  Proof  of  the  Main  Result 

Note  that  a  subset  X  C  ilp(R)  is  convex  if  every  geodesic  segment  whose  endpoints  are  in  X 
lies  entirely  in  X.  The  (closed)  convex  hull  co(2))  of  a  subset  2)  C  iip  is  the  intersection  of  all 
(closed)  convex  subsets  of  iip  that  contain  if) . 

Lemma  1.  If  pi(t)  £  ilp(M)  then  the  operation  m  is  well-defined  in  the  sense  that  it  admits  a 
solution  and  this  solution  is  unique  whenever  (at  least)  one  Pj(t)  £  iip,  j  £  Ml{t)  does  not  give 
support  to  small  set0. 

Recall  that  (ilp(]R),fp)  is  CAT(O)  (indeed  it  has  the  stronger  property  of  vanishing  cur¬ 
vature),  in  addition  to  being  uniquely  geodesic,  complete  and  separable,  i.e.  (ilp(R),£p)  is  a 
Hadamard  space.  This  lemma  then  follows  from  the  fact  that  (il p(M.),£p)  is  Hadamard  and 
Frechet  averages  such  as  defined  by  operations  of  the  form  (JT])  are  well  defined  in  such  spaces; 
see  page  334  in  [45] .  The  existence  and  uniqueness  of  solutions  to  m  is  also  discussed  in  [32] 

1 A  small  set  is  defined  [42]  as  a  set  of  Hausdorff  dimension  0.  This  condition  plays  a  role  only  in  uniqueness  and 
it  is  generally  unnecessary  B3S3!-  However,  this  requirement  does  exclude  empirical  measures  on  R  which  arise 
in  numerous  applications  relevant  to  this  work  (as  discussed  later).  Luckily,  it  is  generically  true  (i.e.  excluding 
particular,  non-generic,  arrangements)  that  Jl])  has  a  unique  solution  even  in  such  cases;  see  I43II44I.  Note  if  all 
inputs  are  discrete  we  allow  for  both  common  and  uncommon  supports.  Going  forward  we  will  not  repeatedly 
call  on  the  need  for  (at  least)  one  initial  measure  to  exclude  support  on  small  sets  and  later  results  may  be  read 
as  implicitly  assuming  uniqueness  (or  implicitly  assuming  exclusion  of  support  on  small  sets). 


more  generally;  see  also  [431144] .  It  is  worth  noting  in  passing  the  related  work  in  [241128]  which 
deals  with  similar  consensus  topics  in  CAT(O)  spaces,  and  [26]  which  deals  with  consensus  in  a 
general  class  of  convex  metric  spaces. 

The  convex  hull  of  the  set  of  measures  {pi},  i  £  V  C  V,  is  defined  by 

co ({//*})  =  {argmin  ^  Wj£p(r],  Pi)p\wj  >  0,  Y'u;*  =  1}. 

»?eUP(R)  ,g~  , 

Lemma  2.  Consider  a  collection  {//* } ,  i  £  V  C  V  of  distinct  measures  in  (Up(M),£p).  The 
convex  hull  of  {//*}  is  co({/ij})  C  itp(R)  and  is  isometric  to  a  l-sided  convex  polygon  in  R2  with 

2<Z<|{MI- 

Before  proceeding  with  the  proof  we  point  to  [46]  for  background  on  comparison  triangles 
and  Alexandrov  curvature  of  metric  spaces.  We  also  note  that  in  a  general  geodesic  CAT(O) 
space,  i.e.  some  arbitrary  geodesic  space  with  non-positive  curvature,  the  preceding  lemma  is 
not  true  and  the  convex  hull  of  a ‘geodesic  triangle’  defined  by  three  points  in  such  spaces  may 
be  of  dimension  greater  than  twco  see  Chapter  II. 2  in  [46] . 

Proof.  Lemma  2  is  a  simple  consequence  of  the  vanishing  curvature  property  of  (ilp(M),^p).  We 
elaborate  for  completeness.  (ilp(R),  lp)  has  vanishing  curvature  in  the  sense  of  Alexandrov  (see 
Proposition  4.1  in  [39])  which  formally  means  that  for  any  triangle  of  points  {pi},  i  £  {44243} 
and  any  point  on  the  geodesic  ps  £  ilp(R),  s  £  [0, 1]  such  that,  for  example,  ps= 0  =  pp  and 
Hs= 1  =  IP2  then  the  lp  distance  between  pl3  and  ps,  s  £  [0, 1]  is  the  same  as  the  corresponding 
Euclidean  distance  in  a  comparison  triangle  in  R2.  Consider  also  any  pair  of  points  pj  and  p^ 
with  pj  on  the  geodesic  connecting  p-n  and  pi2  and  pk  on  the  geodesic  connecting  plx  and  pi3 
with  {pj,Pk}  H  {pi}  =  0,  i  £  {44243}-  Then  vanishing  curvature  also  implies  (pj,  pf.) 
is  equal  to  the  usual  interior  Euclidean  angle  at  the  corresponding  vertex  in  the  comparison 
triangle  in  R2.  Here  the  angle  (pj,  pj.)  is  the  Alexandrov  angle  in  arbitrary  metric  spaces; 
see  Chapter  II.  1  in  [46] .  It  now  follows  that  the  convex  hull  of  any  triangle  of  points  {pi}  in 
(ilp(R),  £p)  is  isometric  to  a  triangle  in  R2;  e.g.  see  Proposition  2.9  (Flat  Triangle  Lemma) 
in  [46] .  Now  define  £  =  {Aj}  to  be  the  collection  of  geodesic  triangles  in  (Hp(R),£p)  defined 
by  every  combination  of  three  points  in  {pi},  i  £  V  C  V.  Clearly  co({/ij})  =  U  j  A  j .  Consider 
also  the  corresponding  collection  =  {A^}  of  comparison  triangles  in  R2.  The  Flat  Triangle 
Lemma  implies  that  this  collection  can  be  arranged  in  R2  such  that  each  angle  ZjH{p3.  p/f)  and 
each  distance  £p(pi,pj )  for  all  i,j,k  £  V  in  (Up(R),Lp)  equals  exactly  the  corresponding  angle 
or  distance  in  the  comparison  configuration  of  points  in  R2.  Obviously,  the  convex  hull  of  the 
comparison  configuration  is  a  /-sided  convex  polygon  in  R2  with  2  <  l  <  \{pt} \  and  equal  to 
U  /A+.  Define  the  following  map 

fep,d  '■  co({/h})  — >  R2j  i  £  V  (2) 

so  the  restriction 

fiPAAi)  =  hPAco({Pji^j2^j3})) 

=  co({fepAPh),  ftp  A Mja).  ftPAPh)})  = 

Vj  £  £  =  {Aj}  is  an  isometry.  Then 

f tPAcA{iii}y)  =  /^p,d(UjAj)  =  Uj  fiPAAj)  = 

from  the  Flat  Triangle  Lemma  and  the  property  of  vanishing  curvature.  For  any  two  points  in 
co  {{pi})  there  exists  a  Aj  £  £  that  contains  them  and  the  restriction  fppAAj)  is  an  isometry 
to  a  convex  subset  of  UjA^.  Thus,  fep>d  is  an  isometry  and  this  completes  the  proof.  □ 

2  Our  Euclidean  intuition  is  generally  wrong  when  it  suggests  the  existence  of  a  two-dimensional  convex  hull 
for  a  triangle  defined  by  three  points  and  the  geodesics  connecting  them  (albeit  this  is  hard  to  visualise  of  course). 


Lemma  3.  Consider  the  convex  hull  co ({pj(t)}),  with  j  £  Mi{t)  at  time  t.  If  agent  i  applies 
m  it  follows  that  pi{t-\-  1)  is  strictly  within  the  convex  hull  co {{pj(t)})  whenever  |{/ij(i)}|  >  2 
and  two  agent  states  are  distinct  and  Wij(t )  £  (0, 1). 

Proof.  It  is  enough  to  consider  two  agents  0  i,j  £  V  with  (pQ)  then  given  by 

Mi(t  +  1)  =  argmin  wa(t)  (£p(rj,fj,i(t))p  -  £p(r/,fj.j(t))p)  +  £p(p,  Pj{t))p 

jjeHp(R) 

and  to  note  that  p  must  lie  on  a  geodesic  ps  :  I  — >  1IP(R).  The  proof  relies  on  showing  that 
Pi(t  +  1)  ^  Pj{t)}  when  wn,Wij  £  (0, 1).  The  first  term 

wn(t)  ( £p(p,Pi{t))p  -  £p(p,Pj(t))p) 

is  strictly  negative  at  p  =  pi(t)  and  strictly  increasing  as  p  moves  from  pi(t)  to  Pj(t)  and 
conversely  £p(p,pj(t))p  is  strictly  positive  at  p  =  p-L{t)  and  strictly  decreasing  to  zero  as  p 
moves  from  pi(t)  to  Pj{t).  Then  for  any  wu  £  (0, 1)  and  because  £p  is  continuous  it  follows  that 
there  exists  some  pe  on  ps  with  e  >  0  such  that 

wu(t)  (£p(p,  Hi{t))p  —  lp(p,  <  0 

\wn{t)  (£p(p,m{t))p  -  £p(p,ixj{t))p)  |  <  £p(p,  Pj(t))p 

on  p  £  ns,  s  £  [0,e].  Consequently,  Hi(t  + 1)  is  strictly  decreasing  on  p  £  /xs,  s  £  [0,e].  Hence  for 
any  wu  £  (0, 1)  the  point  pLi(t)  cannot  be  a  minimum.  The  same  argument  applies  to  Pj(t).  □ 

The  following  is  a  simple  consequence  of  the  preceding  result. 

Corollary  1.  Consider  the  convex  hull  co({//j(0)})  of  all  initial  agent  states  in  (ilp(M),£p).  If 
each  agent  applies  m  it  follows  that  co ({p>i(t)})  C  co({/ij(0)})  for  all  t. 

The  next  result  concerns  an  importance  special  case  of  the  main  result. 

Lemma  4.  Suppose  Q(V,S)  is  time-invariant  and  connected.  Suppose  the  state  of  each  agent 
is  Hi{t)  £  itp(M)  and  that  each  agent  applies  (f 1\).  Then  there  exists  n*£  Hp(R)  such  that  for  any 
i  £V  it  holds  that 

lim  £p(ni  (t),n*)  =  0. 

t— >  oo 

Proof.  It  almost  goes  without  saying  that  £2 ,  Hj (t))p  =  0,  Vz,  j  £  V  with  a  constant  Hi(t) 
in  ilp(M)  is  an  equilibrium  state  of  JT]).  Consider  a  Lyapunov-like  function  v{p)  :  — >  R 

given  by 

v(p)=  sup  £P{p,x)P  (3) 

V,x£{Ui(t)}iev 

and  note  that  >  0  with  u(p.)  =  0  if  and  only  if  pi  =  pj  for  all  i,j  £  V  0.  By  Corollary 
1  it  follows  that  v(p)  is  non-increasing  along  trajectories  of  (JT]) .  It  suffices  to  show  v{p{t  + 
n  —  1))  <  v(p{t))  for  each  t.  Firstly,  pick  a  to  >  0  and  note  co({//j(to)})  C  co({/LZj(0)})  and 
/£p,rf(co({/Zi(t0)}))  C  fep,d(co({pi(0)}))  from  Corollary  1  and  where  feptd  is  an  isometry  given 
by  ©.  Without  loss  of  generality,  via  Lemma  2,  suppose  that  /^p,d(co({/ij(to)}))  is  a  Z-sided 
polygon  in  R2  with  2  <  l  <  |V|  on  the  collection  of  vertices  {xj(fo)},  j  £  { 1 , . . . ,  Z}  with 

3This  is  because  Frechet  averages  in  Euclidean  space  (on  a  set  of  input  points)  are  associative,  and  can  be 
found  iteratively  by  computing  the  average  initially  for  a  pair  of  points  (in  a  larger  set  of  inputs),  and  then 
computing  the  average  between  this  result  and  the  next  point  (in  the  input  set),  and  so  on  (adjusting  the  weights 
defining  the  average  each  time),  see  33-  Owing  to  Lemma  2,  this  associativity  property  still  holds  here. 

4We  abuse  notation  here  slightly.  We  use  the  shorthand  u  as  the  argument  in  to  represent  the  collection 
of  all  agent  measures  {ni(t)}iev  with  |V|  =  n. 


Xj  (to)  S  1R2-  If  we  chose  a  to  such  that  /  =  1  then  we  would  be  done.  Define  the  following 
set-valued  function 


hj(t)  =  {*  G  V  :  fop,d(m(t))  =Xj(to)}  ,  Vj  €  {1,  ■  ■  ■  ,1}  (4) 

for  each  time  t  >  to-  It  is  immediate  from  Lemma  3  that  hj(t  +  1)  C  hj(t)  for  all  j  G  {1, . . .  ,  Z}; 
i.e.  more  generally,  no  agent  state  fip,d(Pi(t))  which  is  not  on  the  boundary  of  the  /-sided 
polygon  at  time  t  can  ever  reach  this  same  boundary  at  t  +  1  as  a  consequence  of  Lemma 
3.  Note  that  \hj(to)\  <  n  —  1  for  all  j  G  {1,...  ,/}  with  l  >  2  at  to-  Recall  the  neighbour 
set  at  agent  i  is  given  by  A fi(t).  Because  the  network  is  connected,  for  each  k  G  hj(to )  the 
neighbour  set  obeys  A4(fo)  7^  0  for  each  j  G  {1,...,/}.  Then  by  Lemma  3  it  follows  that 
hj (to  +  1)  C  hj(to)  since  at  least  one  k  G  hj(to)  must  be  connected  to  an  agent  outside  hj(to ) 
and  this  agent’s  state  must  change  ^k(fo)  7^  Pk{t o  +  1)  as  a  consequence  of  Lemma  3  such  that 
fep,d(Vk(t0  +  1))  7^  Xj.  At  the  next  time  to  +  1  it  holds  again  that  for  each  k  G  hj(to  +  1) 
(assuming  hj  (to  +  1)  7^  0)  the  neighbour  set  obeys  A4(io  +  1)  7^  0  for  each  j  G  {1, . .. ,/}. 
Then  by  application  of  Lemma  3  it  follows  again  that  hj  (to  +  2)  C  hj  (to  +  1)  C  hj(to).  Thus, 
hj(t  +  1)  C  hj(t )  is  a  strictly  decreasing  set-valued  function  unless  hj(t )  =  0.  By  at  most  time 
to  +  n  —  1  it  follows  that  hj  (to  +  n  —  1)  =  0  and  the  argument  can  reset  by  redefining  to-  It 
follows  that  /fp,d(co({//i(to+n-l)}))  c  /^,d(co({^(t0)}))  for  all  t0  >  0.  Following  the  proof  of 
Lemma  2  we  know  co({/ij(to  +  n  —  1)})  C  co({/ij(to)})  and  thus  because  we  chose  to  arbitrarily 
u(n{t  +  n  —  1))  <  for  each  t  G  N  unless  (Xi{t  +  n  —  1)  =  Vi,  as  desired.  The  existence 

of  a  strictly  decreasing  Lyapunov  function  completes  the  proof.  □ 

The  preceding  lemma  specialises  the  Theorem  to  the  case  where  the  network  topology  is 
connected  and  time-invariant  (but  otherwise  arbitrary).  This  lemma  is  of  interest  on  its  own 
in  many  applications  in  which  the  topology  is  static.  Proof  of  this  lemma,  given  Lemmas  1-3, 
follows  roughly  the  analysis  of  [5]  on  nonlinear  consensus  in  the  usual  Euclidean  metric  space. 

We  are  now  ready  to  prove  Theorem  1. 

Proof,  (of  Theorem  1)  The  proof  here  relies  on  extending  the  previous  lemma  to  the  case 
where  Q(t)(V,£(t))  is  time-varying  and  for  all  to  £  N  the  graph  union  Q5(to,oo)  is  connected. 
Recall  the  same  Lyapunov  function  ([3])  as  used  in  the  proof  of  Lemma  4  (we  assume  familiarity 
with  the  proof  of  Lemma  4  going  forward). 

We  note  that  it  suffices  to  show  that  there  is  a  countably  infinite  number  of  finite  time 
intervals  t  G  [f^t^],  q  G  N  such  that  v(n{tQ  + 1^))  <  v{n{to)). 

Pick  t^  >  0,  q  G  N  so  /?p,d(co({/Lti(to)}))  is  a  /-sided  polygon  in  M2  with  2  <  /  <  |V|  on 
the  collection  of  vertices  {xj(to)},  j  G  {1,...  ,/}  with  Xj(tg)  G  M2.  Recall  flU).  Then  define  a 
sequence  of  times  {tqs^},  s(j)  G  N  each  greater  than  for  each  j  G  {1, . . . ,/}  with  /  >  2.  The 
connectivity  condition  implies  the  existence  of  such  a  sequence  for  each  j  with  the  property 
that,  if  hj(tqs yP  7^  0,  there  exists  a  k  G  hj[tqg^)  that  is  connected  to  an  agent  outside  hj(tqg ^). 
Then,  this  agent’s  state  must  change  Pk{tqs^)  7^  Wfc(^(j)  +  1)  as  a  consequence  of  Lemma  3  and 
fipAVkttllj)  +  X))  ^  Xj(f l).  Then  M^j)  +  X)  c  jf>  for  a11  J  G  {!,  ■  ■  •  J}  unless  obviously 
M^(j))  =  0.  As  in  the  proof  of  Lemma  4  it  holds  that  s (J )  >  n  —  1  implies  hj(tq^^+1)  =  0  for 

all  j.  Let  £q  =  min{t  G  N  :  t  >  t^,  s(j)  >  n  —  1.  Vj}  and  note  then  that  the  interval  t  G  is 

finite  owing  to  the  connectivity  condition.  Moreover,  as  in  the  proof  of  Lemma  4  one  can  then 
show  that  ^(^(tg))  <  i/(/x(^)).  Restart  the  argument  by  picking  tg+1  to  be  equal  or  sufficiently 
close  to  £q  and  note  that  the  connectivity  condition  then  implies  the  number  of  such  (finite) 
intervals  t  €  [fo>^o]  is  countably  infinite  on  q  G  N. 

We  thus  have  a  strictly  decreasing  Lyapunov  function  1 '(/x(tg))  <  ^(//(tg))  on  the  sequence 
of  finite  intervals  t  G  [tg ,  tg] ,  q  G  N  and  this  completes  the  proof.  □ 


3  Special  Cases  and  Convergence  Details 

Firstly,  given  Theorem  1,  it  is  worth  noting  the  following  result. 

Proposition  1.  Consider  a  group  of  agents  V  and  network  Q(t){V ,£(t))  where  each  agent  i  has 
initial  state  Pi(0)  £  iXp  (R)  and  updates  its  state  m(t )  according  to  U\).  Suppose  for  all  ^  €  N 
the  graph  union  &(to,oo )  is  connected  so  that  Theorem  1  applies  and  there  exists  p*  £  ilp(M) 
such  that  Ywat^oo  if)  ■>  h*)  =  0  for  any  *  £  V.  Then  there  exists  some  symmetric  weight 
matrix  W  =  \u>ij\  £  Mnxn  with  vJij  £  (0, 1)  and  =  1  for  all  i  such  that  for  any  i  £  V 

p*  =  argrnin  V''  Wij£p(r],  pj  (0))p 
>?eUp(R) 

where  we  emphasize  that  W  is  not  (generally)  the  same  as  W(t)  but  it  is  solely  dependent  on 
the  sequence  W (t),  t  £  N  ami  (possibly)  the  initial  measures  {//j(0)}igV. 

Proof  of  this  proposition  is  straightforward  given  the  actual  convergence  result  stated  in 
Theorem  1.  This  result  states  that  the  common  measure  which  all  agent  states  converge  to  is 
within  the  convex  hull  of  all  initial  agent  measures  in  it,. 

An  interesting  open  problem  is  how  one  can  design  the  evolution  of  W (t),  t  £  N  such 
that  for  a  set  of  measures  {pi{ 0)}igV  the  final  weighting  matrix  W  specifies  a  limit  p*,  i.e. 
lirn^oo  ipipi  (t) ,  p*)  =  0  for  any  i  £  V  ,  that  is  optimal,  or  desired,  in  some  sense  (e.g.  minimum 
variance  over  all  possible  W  given  pi(0)  £  itp  (R),  i  £  V). 

In  the  remainder  of  this  section  we  consider  convergence  to  particular  limits  of  interest,  e.g. 
to  a  limit  equally  close  to  all  agent’s  initial  measures.  We  also  consider  convergence  speeds  and 
we  consider  computational  aspects  of  the  update  protocol  for  given  classes  of  input  measures. 

3.1  Convergence  with  Gaussian  Measures 

In  this  subsection  we  consider  the  evolution  of  the  operation  (P  at  each  agent  i  £  V  when  pt(0) 
is  a  Gaussian  measure.  We  consider  the  case  in  which  p  =  2  and  firstly  consider  measures  on 
U2(Mm)  for  which  the  Wasserstein  metric  definition  can  be  straightforwardly  extended. 

Lemma  5.  'JBj  Suppose  that  pi(t)  £  il2(Rm)  for  all  i  £  V  admits  a  Gaussian  probability 
density  of  the  form  Af{pi,  P,:).  Then  the  solution  to 

Pi{t+  1)  =  argrnin  EjeA/ht)  ^iji^iv,  Vj{t))2 

77€il2(Rm) 

is  the  Gaussian  measure  pi(t  +  1)  £  it2(Rm)  of  density  A/"(q,  Q)  where  q  =  EjeA/",(i)  wij  (t)Pj 
and  Q  is  the  unique  positive- definite  solution  to 

Q  =  ’"«(*)  (Q1/2PjQ1/2)'/2 

which  is  guaranteed  to  exist. 

In  the  scalar  case  it  follows  that  Pi(t  +  1)  £  it2(R)  is  a  Gaussian  measure  of  density  M(q,  Q ) 
where  the  variance  is  given  by 

Q  =  (Ej-ga m  *>ij(t)Pj/2} 

and  the  updated  mean  q  is  given  as  in  LemmaEl  i.e.  q  =  Ej&A/i(t) 


Proposition  2.  Consider  a  group  of  agents  V  and  a  connected  time-invariant  network  Q(V,£). 
Assume  W  is  doubly  stochastic  and  that  pi( 0)  €  il2(R)  admits  a  Gaussian  density  Af(pi(0),  P*( 0)). 
Then  pi{t  +  1)  €  il2(R)  in  (CD)  admits  a  Gaussian  density  N(pi(t  +  1  ),Pi(t  +  1))  where 

Pi(t  +  1)  =  EjgM  WijPj(t ), 

Moreover  we  have  for  any  i  €  V  lim^oo  £2(Pi{t),p*)  =  0  exponentially  fast  where  p*  satisfies 

T*  =  -argmin^.gV£2(?7,^i(0))2. 
n  rfGU2(R) 

Proof.  With  Gaussian  initial  measures  the  operation  ©  collapses  to  the  standard  linear  con¬ 
sensus  update  (on  the  mean  and  variance)  as  shown  in  Lemma  [5j  Moreover,  linear  consensus  in 
R  over  a  time-invariant  network  with  a  doubly  stochastic  weighting  matrix  leads  asymptotically 
to  ‘average’  consensus  [9] .  Thus,  in  this  case,  the  mean  and  standard  deviation  converge  to  the 
averages  of  all  initial  agent  values  and  the  result  follows.  □ 

We  have  introduced  a  special  scenario  in  the  above  proposition  involving  time-invariant 
networks  and  then  provided  a  particular  (though  popular)  scalar  average  consensus  algorithm  [7] 
to  update  the  mean  and  variance  (standard  deviation)  at  each  iteration.  More  general  results  on 
average  consensus  that  allow  for  time-varying  networks,  finite-time  convergence  etc.  ^[6, 71119] 
may  be  substituted  (but  we  do  not  explore  this  topic  further  here). 

Although  the  update  |T])  is  linear  (in  mean/variance)  and  closed  in  the  event  of  Gaussian 
input  measures,  this  is  not  generally  the  case,  and  the  consensus  problem  m  is,  in  general, 
inherently  nonlinear.  Nevertheless,  we  subsequently  show  that  certain  convergence  properties 
of  ([TD  at  each  agent  follow  by  analysing  a  linear  consensus  protocol. 

3.2  General  Convergence  Speeds  and  Average  Wasserstein  Consensus 

In  this  subsection  we  consider  only  undirected,  connected,  and  time-invariant  network  graphs 
Q(V,£).  We  consider  only  measures  with  finite  second  moment  and  we  work  solely  in  the 
Wasserstein  metric  space  denoted  by  (II2  (R) ,  £2). 

The  first  result  considers  the  convergence  speed  of  the  entire  group  of  agents  under  the 
protocol  ©• 

Proposition  3.  Consider  a  group  of  agents  V  and  a  connected  time-invariant  network  Q(V,£) 
where  each  agent  i  updates  its  state  Pi(t)  €  il2(R)  according  to  IT]).  Then  there  exists  p*  €  il2(R) 
such  that  for  any  ieV, 

lim  £2{pi{t),p*)  =  0 

t—¥  OO 

at  an  exponential  rate. 

Proof.  If  pi{ 0)  €  !l2(K)  then  the  solution  to  (|TJ)  at  any  i  €  V  and  any  t  €  N  can  be  written  in 
the  form 

Piit  +  1  )(M)  =  (£i6A*  mj  T]ft))  #  Pi(t)(M) 

for  all  Borel  sets  M  on  (M,  d)\  see  [32],  Here  ( T-(t))ffpi(t )  denotes  the  push  forward  of  Pi(t) 
to  Pj(t)  through  the  non-decreasing  measurable  map  Tj(f)  :  M  — y  M  such  that  ( Tl-{t))ffpiit )  = 
Pj(t).  Obviously,  we  have  ( Tf(t))ffpi(t )  =  pi(t).  For  any  measure  if(t)  dominated  by  the 
Lebesgue  measure  on  R  it  follows  that 


where  {Tj(t))ffip(t)  denotes  the  push  forward  of  if(t)  to  see  [35].  Write  the  cumulative 

distribution  function  for  each  €  il2(M)  by  Fi(x)  :  M  — >  R  and  Fi(x)  =  fM  ((— oo,®]).  Define 
its  inverse  F+  (x)  :  M  — >  M  by 

K+(x)  =  inf  {y  £  M  :  F)(x)  >  y} 
y 

for  all  x  £  M.  One  can  show  [02]  that  Tl-[t)  =  F)+  o  Fi  or  if  ip(t)  is  uniform  on  [0, 1]  then 
Tj(t)  =  F+  and 

IM(t  +  1) (M)  =  Wj  F+)  #^(t)(M) 

with  if{t)  uniform  on  [0, 1].  It  follows  directly  that  the  solution  to  (0])  at  any  i  £  V  and  any 
t  £  N  has  an  inverse  cumulative  distribution  function  given  by 

F+{t  +  l)(x)  =  EjeW  waFt  (5) 

for  all  i£l, 

Now  one  can  stack  these  functions  so  F+(t  +  l)(x)  =  WF+(t)(x)  for  all  x  £  Mn.  From 
the  assumed  network  connectivity  condition  and  the  weighting  assumptions  we  conclude  that 
W  is  row-stochastic  and  primitive  with  a  distinct  maximum  eigenvalue  of  1.  The  remaining 
n  —  1  eigenvalues  have  an  absolute  value  strictly  less  than  1.  The  convergence  rate  of  i?+(t)(x) 
is  determined  by  the  convergence  rate  of  W*  to  the  rank  one  matrix  luT  associated  with  the 
maximum  eigenvalue.  Writing 

n  n 

W#  =  Xi^iUi  =  luT  +  AiV*U*T 

i=  1  i= 2 

where  A i  is  the  i’th  eigenvalue  of  W,  it  then  follows  that  ||W*  —  luT||  =  ||  E"=2  Aiv*u7 II  vanishes 
exponentially  at  a  rate  dominated  by  the  absolute  value  of  the  second  largest  eigenvalue  (which 
is  strictly  less  than  1)  and  the  proof  is  complete.  □ 

Note  that  a  time-invariant  network  model  is  certainly  not  necessary  for  exponential  con¬ 
vergence  but  we  do  not  consider  further  generalisation  in  this  work.  It  is  important  to  note 
that  the  time-varying  network  connectivity  condition  allowed  in  Theorem  1  is  also  certainly  too 
weak  to  ensure  exponential  convergence  in  general.  Indeed,  Theorem  1  does  not  even  require 
the  network  to  be  jointly  connected  until  some  arbitrary  finite  future  time. 

Corollary  2.  Consider  a  group  of  agents  V  and  a  connected  time-invariant  network  Q(V,£). 
Suppose  that  W  is  doubly  stochastic  and  that  each  agent  i  updates  its  state  yi(t)  €  lt2(M) 
according  to  m ■  For  any  i  G  V,  we  have  lim =  0  at  an  exponential  rate  where 

Proof.  This  result  follows  again  because  linear  consensus  in  M  over  a  time-invariant  network  with 
a  doubly  stochastic  weighting  matrix  leads  asymptotically  to  ‘average’  consensus  f9[].  Looking  at 
(f^j)  we  see  that  (nonlinear)  consensus  via  (0])  is  related  to  (linear)  consensus  in  the  space  of  inverse 
cumulative  distribution  functions.  Moving  from  the  limiting  inverse  cumulative  distribution 
function  to  a  probability  measure  does  not  change  the  limiting  1/n  averaging  coefficient.  □ 

This  corollary  provides  sufficient  conditional  under  which  exponential  convergence  to  a  mea¬ 
sure  is  achieved  and  where  the  consensus  measure  achieved  asymptotically  at  each  agent  is  an 
average  distance  to  all  initial  measures.  We  note  that  other  consensus  measures  may  be  more 
desirable,  e.g.  one  may  want  to  reach  an  agreement  on  that  measure  with  the  smallest  variance 
within  the  convex  hull  of  all  initial  measures. 

5A  time-invariant  network  topology  and  a  doubly  stochastic  weighting  matrix.  The  time-invariance  constraint 
can  be  relaxed  (it  is  just  sufficient)  but  we  do  not  consider  generalisation  here. 


3.3  Computational  Aspects  of  the  Update  Protocol 

In  the  case  of  Gaussian  input  measures,  we  have  shown  that  the  updating  step  of  our  consensus 
algorithm  can  be  performed  in  closed  form  and  that  the  resulting  algorithm  resembles  a  partic¬ 
ular  case  of  standard  linear  consensus  in  R,  e.g.  see  [9j.  In  the  general  case,  the  optimization 
problem  ©  typically  does  not  admit  such  a  closed  form  solution.  However,  it  is  convex  [321144] 
and  thus  numerical  methods  are  feasible  and  already  exist  in  a  number  of  cases;  see  [43]1MI49]. 

Moreover,  consider  the  important  scenario  where  all  the  input  initial  measures  are  (weighted) 
empirical  measures 

N 

Vi  (0)  (dx)  =J2a)6Xij(0)  (dx)  ’ 

3= 1 

where  5y  ( dx )  denotes  the  delta-Dirac  measure  located  at  y,  ct)-  >  0  and  Ylf=i  a }  =  1-  In 
this  case,  the  minimization  in  (]TJ)  can  be  solved  exactly  via  a  finite-dimensional  linear  program 
[501151]  and  the  resulting  measure  is  also  an  empirical  measure^.  However,  the  computational 
requirements  of  this  linear  program  may  explode  quickly  with  the  number  of  input  measures 
and  the  number  of  atoms  of  these  measures;  see  [33ll341l50ll5T].  Nevertheless,  numerous  fast 
approximation  methods  have  been  derived  [441l471l49ll52j.  The  details  of  these  algorithms  are 
beyond  the  scope  of  this  work,  but  it  follows  that  operation  (JTj)  is  thus  practical  in  the  application 
important  case  involving  empirical  measure^. 

Consider,  more  generally,  arbitrary  input  measures  on  R.  Convexity  of  the  minimisation 
problem  is  advantageous  in  general,  but  in  this  case  (i.e.  with  measures  on  the  line)  there  are 
yet  further  virtues.  Refer  to  ©  which  is  intimately  related  to  ©.  The  update  in  ©  is  typically 
computable  in  closed-form  and  thus  we  ‘almost’  have  a  general  closed-form  expression  for  © 
already.  This  relationship  between  the  update  ©  and  the  inverse  cumulative  distribution 
function  has  been  explored  in  [421444]  with  example  computations  and  as  a  lead  into  more 
general  computational  results.  We  refer  the  interested  reader  to  this  literature.  We  also  note  in 
conclusion  that  the  computational  aspect  of  this  update  equation  is  an  ongoing  research  topic. 

4  Discussion  and  Applications 

The  output  of  operation  ©  is  known  as  the  Wasserstein  barycenter  in  the  literature  and  the 
limit  to  which  all  agents  convergence  yV  G  ilp(R)  is  similarly  a  Wasserstein  barycentre  (on 
all  initial  agent’s  measures).  In  other  words,  this  work  studies  the  convergence  properties  of 
a  consensus  algorithm  concerned  with  distributed  (iterative)  computation  of  the  Wasserstein 
barycentre  over  a  (possibly)  time-varying,  arbitrary,  network  topology. 

While  this  is  the  first  such  study  in  this  direction,  potential  applications/uses  for  the  Wasser¬ 
stein  barycentre  (itself)  have  been  considered  previously  in  a  number  of  fields  [331.1341144114711491 
I5T1I521I52H54]  and  this  list  is  by  no  means  exhaustive. 

Arguably  the  most  popular  domain  in  which  the  Wasserstein  barycentre  has  found  applica¬ 
tions  is  in  computer  vision  and  image/video  processing  [441151 1152).  We  do  not  consider  specifics 
here  but  the  interested  reader  may  consult  [44]  where  numerous  examples  and  a  detailed  dis¬ 
cussion  is  given.  It  is  noted  [44]  that  state-of-the-art  advancements  in  a  number  of  related 
problems  have  arisen  via  the  use  of  Wasserstein  barycenters.  Importantly,  both  Gaussian  and 
discrete  measures  find  applicability  through  the  Wasserstein  barycentre  in  computer  vision  and 
image/video  processing;  again  see  [44], 

®As  discussed  previously,  for  special,  certainly  non-generic,  arrangements  of  discrete  measures  the  minimisation 
in  GJ  may  not  have  a  unique  solution  (though  a  solution  always  exists)  [44] .  We  ignore  these  degenerate  special 
cases  (leading  to  non- uniqueness)  since  they  may  only  arise  under  constraints  of  little  practical  interest. 

'Interestingly,  if  each  input  measure  is  defined  by  a  single  Dirac  (in  (ifRR),^),  i.e.  with  p  =  2),  then  the 
classical  (linear)  consensus  algorithm  in  R  is  recovered,  e.g.  as  in  010.  Of  course,  typically  one  is  interested  in 
more  general  empirical  input  measures. 


Applications  in  machine  learning  and  Bayesian  statistics  have  also  made  use  of  the  Wasser- 
stein  barycentre  [4311491153]  and  it  is  envisioned  that  this  technology  (and  the  related  optimal 
transportation  problem)  will  find  wider  adoption  in  this  field.  In  this  setting,  distributed  (or 
even  parallel)  computation  of  the  Wasserstein  barycentre  is  likely  important;  e.g.  distributed 
Bayesian  computation  on  large  data  sets  is  the  subject  of  [53]. 

Information  fusion  and  distributed  estimation  (e.g  distributed  particle  filtering)  have  been 
studied  in  the  context  of  Wasserstein  barycenters  |33)j34ij43].  This  topic  is  one  that  may  use 
directly  the  consensus  algorithm  proposed  [33]  (with  empirical  measures  as  inputs;  e.g.  from 
the  output  of  Monte  Carlo  estimators  like  particle  filters).  Data  fusion  with  correlated  Gaussian 
measures  has  been  shown  to  be  consistent  (i.e.  not  optimistic)  with  this  algorithm  when  the 
correlation  is  ignored  [34] . 

5  Concluding  Remarks 

Distributed  consensus  in  the  Wasserstein  metric  space  of  probability  measures  was  introduced 
in  this  paper.  It  is  shown  that  convergence  of  the  individual  agents’  measures  to  a  common 
measure  value  is  guaranteed  if  a  relatively  weak  network  connectivity  condition  is  satisfied.  The 
measure  that  is  achieved  asymptotically  at  each  agent  is  the  measure  that  minimises  a  weighted 
sum  of  its  Wasserstein  distances  to  these  initial  measures  and  is  known  as  the  Wasserstein 
barycentre  in  the  literature. 

Finally,  we  note  that  following  0,  it  would  be  straightforward  to  consider  an  extension  to 
the  case  in  which  the  network  topology  is  directed  and  one  expects  analogous  results  (concerning 
connectivity)  to  apply  in  the  Wasserstein  space  considered  herein.  For  brevity,  and  notational 
simplicity,  we  do  not  explore  this  scenario  further.  Moreover,  one  may  seek  analogous  results 
in  the  Wasserstein  metric  space  of  measures  defined  on  the  Borel  sets  of  (Rm,d)  for  some 
m  >  2.  We  conjecture  that  similar  results  hold  in  this  case.  However  while  many  of  the 
lemmas  used  herein  carry  over  immediately,  this  generalization  is  not  straightforward.  Indeed, 
the  Wasserstein  metric  space  in  such  cases  is  positively  curved  so  it  does  not  resemble  Euclidean 
space  at  all  and  it  is  not  even  CAT(O). 
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